Sentence-level control vectors for deep neural network speech synthesis

نویسندگان

Oliver Watts

Zhizheng Wu

Simon King

چکیده

This paper describes the use of a low-dimensional vector representation of sentence acoustics to control the output of a feed-forward deep neural network text-to-speech system on a sentence-by-sentence basis. Vector representations for sentences in the training corpus are learned during network training along with other parameters of the model. Although the network is trained on a frame-by-frame basis, the standard framelevel inputs representing linguistic features are supplemented by features from a projection layer which outputs a learned representation of sentence-level acoustic characteristics. The projection layer contains dedicated parameters for each sentence in the training data which are optimised jointly with the standard network weights. Sentence-specific parameters are optimised on all frames of the relevant sentence – these parameters therefore allow the network to account for sentence-level variation in the data which is not predictable from the standard linguistic inputs. Results show that the global prosodic characteristics of synthetic speech can be controlled simply and robustly at run time by supplementing basic linguistic features with sentencelevel control vectors which are novel but designed to be consistent with those observed in the training corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Syllable Vectors for Building TTS Front-End with Deep Learning

Recent vector space representations of words have succeeded in capturing syntactic and semantic regularities. In the context of text-to-speech (TTS) synthesis, a front-end is a key component for extracting multi-level linguistic features from text, where syllable acts as a link between lowand high-level features. This paper describes the use of global syllable vectors as features to build a fro...

متن کامل

Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

Despite the success of deep learning on many fronts especially image and speech, its application in text classification often is still not as good as a simple linear SVM on n-gram TF-IDF representation especially for smaller datasets. Deep learning tends to emphasize on sentence level semantics when learning a representation with models like recurrent neural network or recursive neural network,...

متن کامل

News Authorship Identification with Deep Learning

Authorship identification identifies the most possible author from a group of candidate authors for academic articles, news, emails and forum messages. It can be applied to find the original author of an uncited article, to detect plagiarism and to classify spam / nonspam messages. In this project, we tackled this classification task in author level, article level, sentence level and word level...

متن کامل

Attentive Tensor Product Learning for Language Generation and Grammar Parsing

This paper proposes a new architecture — Attentive Tensor Product Learning (ATPL) — to represent grammatical structures in deep learning models. ATPL is a new architecture to bridge this gap by exploiting Tensor Product Representations (TPR), a structured neural-symbolic model developed in cognitive science, aiming to integrate deep learning with explicit language structures and rules. The key ...

متن کامل

Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors

The paper presents a mechanism to perform speaker adaptation in speech synthesis based on deep neural networks (DNNs). The mechanism extracts speaker identification vectors, socalled d-vectors, from the training speakers and uses them jointly with the linguistic features to train a multi-speaker DNNbased text-to-speech synthesizer (DNN-TTS). The d-vectors are derived by applying principal compo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Sentence-level control vectors for deep neural network speech synthesis

نویسندگان

چکیده

منابع مشابه

Global Syllable Vectors for Building TTS Front-End with Deep Learning

Character-Based Text Classification using Top Down Semantic Model for Sentence Representation

News Authorship Identification with Deep Learning

Attentive Tensor Product Learning for Language Generation and Grammar Parsing

Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors

عنوان ژورنال:

اشتراک گذاری